Implementing Cross-Language Text Retrieval Systems for Large-scale Text Collections and the World Wide Web
نویسندگان
چکیده
QUILT (Query User Interface with Light Translations) is prototype implementation of a complete cross-language text retrieval system that takes English queries and produces English gloss translations of Spanish documents. The system indexes the Spanish documents in Spanish, but converts the English query into a Spanish equivalent set through a novel combination of lexical methods and parallel-corpus disambiguatinn. Similar methods are applied to the returned document o produce a simple translation that can be examined by non-Spanish speakers to gauge the relevance of the document to the original English query. The system integrates traditional, glossary-based machine txanslation technology with information retrieval approaches and demonstrates that relatively simple term substitution and disambiguation approaches can he viable for cross-language text retrieval. Components of QUILT have been used to build a CLTR interface to WWW-based search services.
منابع مشابه
ایجاز:یک سامانه عملیاتی برای خلاصهسازی تکسندی متون خبری فارسی
The rapid growth of published documents on the web has created some new requests for processing, classification and information retrieval. So, the use of natural language processing tools has increased around the world. Automatic summarization known as the core of a wide range of text-processing tools such as decision systems, accountability systems, search engines, etc. And always has been inv...
متن کاملCortina: A System for Large-scale, Content-based Web Image Retrieval and the Semantics within
Recent advances in processing and networking capabilities of computers have led to an accumulation of immense amounts of multimedia data such as images. One of the largest repositories for such data is the World Wide Web. There is an urgent need for systems which allow to search these vast on-line collections. We present Cortina, a large-scale image retrieval system for the World Wide Web. It h...
متن کاملKnowledge discovery in the Internet
With the rapid expansion of the World Wide Web, the need for efficient data retrieval strategies becomes stronger and will be still growing. Unfortunately classical information retrieval techniques, developed for well-organized collections of textual data do not seem to be able to cope with diversity and amount of information available throughout the Internet. This paper presents some of the ne...
متن کاملExploiting the Web as Parallel Corpora for Cross- Language Information Retrieval
The expansion of the Web creates more requirements for Cross-Language Information Retrieval (CLIR). Query translation is the key problem. Previous studies have shown that query translation can be done by exploiting a large set of parallel texts. However, the problem arisen is the unavailability of large parallel corpora for many languages. In this paper, we describe a mining system that automat...
متن کاملInformation Retrieval on the Web
For the information retrieval (IR) community, the Web now presents a new paradigm, while also generating new challenges and attracting growing interest from around the world. An important example of these challenges is managing huge text collections and evaluating the usefulness of hyperlinks contained within them.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002